Alternative Strategies for Variable Selection in Linear Regression Models
نویسندگان
چکیده
1. INTRODUCTION 1.1.1. Variable Selection for Incomplete Data sets In statistical practice, many real-life data sets are incomplete for reasons like non-responses or drop-outs. When a data set is incomplete, practitioners frequently resort to a " case-deletion " strategy within which the incomplete cases are excluded from analysis and the complete cases are formed into a reduced rectangular complete-data format. Then traditional variable selection approaches such as stepwise or criterion-based ones can be applied to such a reduced data matrix. The problem with case-deletion is that too much statistical power may be lost and unbiased estimate is obtained only when the missingness mechanism is missing completely at random (MCAR) (Little and Rubin 1987). A more satisfying strategy for handling general missing data problems is via imputation methods, where the missing data are replaced by some plausible values. In order to retain the original " uncertainty " due to missing data, Rubin (1987) developed a strategy named multiple imputation, within which missing data are imputed several times. Then, the multiple imputed complete data sets are analyzed and the multiple parameter estimates are combined into one that incorporates the between-imputation variance, which can be viewed as the measure of the above uncertainty. Using a Markov chain Monte Carlo imputation approach, Schafer (1997) has developed a program to handle incomplete multivariate normal data with unbiased parameter estimation being achieved when missingness mechanism is " ignorable " in the sense defined by Rubin (1976), where missing values are missing at random (MAR) and the parameters of missingness mechanism and the model parameters are distinct. Ignorability is a less stringent case than MCAR and missingness mechanisms of many practical incomplete data can be well approximated by it. For the purpose of variable selection, however, traditional non-Bayesian approaches cannot be easily applied within the framework of multiple imputation. The difficulty lies in how to combine the selection results corresponding to the multiple imputed data sets, because these Non-Bayesian approaches sometimes describe different " best " linear regression models for different imputed data sets. With different predictors included in these " best " models, regression coefficients cannot just be simply averaged since their explanations vary across models.
منابع مشابه
Penalized Bregman Divergence Estimation via Coordinate Descent
Variable selection via penalized estimation is appealing for dimension reduction. For penalized linear regression, Efron, et al. (2004) introduced the LARS algorithm. Recently, the coordinate descent (CD) algorithm was developed by Friedman, et al. (2007) for penalized linear regression and penalized logistic regression and was shown to gain computational superiority. This paper explores...
متن کاملA matrix method for estimating linear regression coefficients based on fuzzy numbers
In this paper, a new method for estimating the linear regression coefficients approximation is presented based on Z-numbers. In this model, observations are real numbers, regression coefficients and dependent variables (y) have values for Z-numbers. To estimate the coefficients of this model, we first convert the linear regression model based on Z-numbers into two fuzzy linear regression mode...
متن کاملAn Overview of the New Feature Selection Methods in Finite Mixture of Regression Models
Variable (feature) selection has attracted much attention in contemporary statistical learning and recent scientific research. This is mainly due to the rapid advancement in modern technology that allows scientists to collect data of unprecedented size and complexity. One type of statistical problem in such applications is concerned with modeling an output variable as a function of a sma...
متن کاملA Comparison between New Estimation and variable Selectiion method in Regression models by Using Simulation
In this paper some new methods whitch very recently have been introduced for parameter estimation and variable selection in regression models are reviewd. Furthermore , we simulate several models in order to evaluate the performance of these methods under diffrent situation. At last we compare the performance of these methods with that of the regular traditional variable selection methods such ...
متن کاملVariable Selection via Partial Correlation.
Partial correlation based variable selection method was proposed for normal linear regression models by Bühlmann, Kalisch and Maathuis (2010) as a comparable alternative method to regularization methods for variable selection. This paper addresses two important issues related to partial correlation based variable selection method: (a) whether this method is sensitive to normality assumption, an...
متن کاملPotentials of Evolving Linear Models in Tracking Control Design for Nonlinear Variable Structure Systems
Evolving models have found applications in many real world systems. In this paper, potentials of the Evolving Linear Models (ELMs) in tracking control design for nonlinear variable structure systems are introduced. At first, an ELM is introduced as a dynamic single input, single output (SISO) linear model whose parameters as well as dynamic orders of input and output signals can change through ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002